The reinforcing influence of recommendations on global diversification 
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Recommender systems are promising ways to filter the overabundant information in modern 
society. Their algorithms help individuals to explore decent items, but it is unclear how they allocate 
popularity among items. In this paper, we simulate successive recommendations and measure their 
influence on the dispersion of item popularity by Gini coefficient. Our result indicates that local 
diffusion and collaborative filtering reinforce the popularity of hot items, widening the popularity 
dispersion. On the other hand, the heat conduction algorithm increases the popularity of the niche 
items and generates smaller dispersion of item popularity. Simulations are compared to mean-field 
predictions. Our results suggest that recommender systems have reinforcing influence on global 
diversification. 

PACS numbers: 89.75.-k, 89.65.-s, 89.20.Ff 
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I. INTRODUCTION. 

Due to the rapid expanding of the internet, we are 
overloaded by the unlimited information on the World 
Wide Web For instance, one has to choose among 
millions of candidate commodities to shop online. Com- 
prehensive exploration is infeasible Q. As a result, var- 
ious recommendation approaches have been proposed to 
help filtering the relevant information For instance, 

the popularity-based recommendations (PR), which rec- 
ommend the most popular items to users, are commonly 
adopted in online recommender systems. However, such 
recommendations are not personalized such that identi- 
cal items are recommended for individuals with far dif- 
ferent taste. By comparison, the collaborative filtering 
(CF) makes use of collective data of individual prefer- 
ence and provides personalized recommendations @ . 
So far, CF has been successfully applied to many online 
applications. 

Recently, recommendation algorithms have been pro- 
posed from a physics perspective 0, H[. For instance, 
diffusion is applied on the user-item bipartite networks 
to explore items of potential interest for a user. This 
mass diffusion (MD) algorithm is shown to outperform 
CF in the recommendation accuracy Q- However, a sim- 
ilar problem as observed in PR is found in MD: diffusion- 
based recommendations are biased to popular items even 
individual preferences are considered. In fact, a good 
recommendation algorithm should recommend items of 
personal interest and at the same time maximize the di- 
versity of choices. 

An alternative approach based on the heat conduction 
(HC) on the user-item graphs is thus introduced [8] . This 
method provides users with many novel items and leads 
to diverse recommendation results among users. How- 
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ever, HC has low accuracy compared with MD. The para- 
dox is eventually solved by combining MD with HC in a 
hybrid algorithm j^j, which can be well-tuned to obtain 
significant improvement in both recommendation accu- 
racy and item diversity. 



Though they are helpful in filtering information, rec- 
ommendation algorithms may impose reinforcing influ- 
ence on the system, by guidance to one's choices which 
influence subsequent recommendations and hence choices 
of others. The influence is amplified with successive rec- 
ommendations. We note that such perspective is em- 
ployed to explain the evolution movie popularity [ToL fll| , 
which yields consistent predictions compared with ob- 
served data. It is thus interesting to examine such in- 
fluence on recommender systems. Unlike most existing 
works which are devoted to improving recommendation 
accuracy Q, our present study presents a physics per- 
spective and utilizes microscopic interactions to explain 
and predict macroscopic behaviors of recommender sys- 
tems [II, [El. 



In this paper, we use the Gini coefficient to measure the 
dispersion in item popularity [T3 |. We note that a small 
dispersion implies similar popularity among items, and 
hence diverse recommendations for users. We consider 
various conventional algorithms including the popularity- 
based, the collaborative filtering, the mass diffusion and 
the heat conduction algorithms. We focus on the physical 
aspects and study numerically and theoretically the rein- 
forcing influence of recommendations on the dispersion of 
item popularity. The result indicates that MD and CF 
reinforce the popularity of popular items, as similar to 
PR. On the other hand, the heat conduction algorithm 
increases the popularity of the niche items and generates 
smaller dispersion in item popularity. Our results suggest 
that recommender systems have reinforcing influence on 
global diversification. 
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FIG. 1: (Color online) The change of Gini index with time in 
the APS citation system and the Baby name system. 

II. DISPERSION OF ITEM POPULARITY 

We quantify the influence of recommender systems by 
measuring the changes in the dispersion of item popular- 
ity after successive recommendations. If the dispersion is 
large, some items dominate in popularity and users have 
limited choices. On the other hand, if the dispersion is 
small, items have similar popularity and users enjoy di- 
verse recommendations. 

To quantify such global diversity, we make use of Gini 
coefficient G [3] to measure the dispersion of item pop- 
ularity, as in the case of individual wealth. In addition to 
wealth, it has been used to measure dispersion in sociol- 
ogy, science and engineering. Mathematically, it is given 
by 

G= 1 -2 f C(x)dx, (1) 
Jo 

where C(x) is the normalized cumulative popularity 
when items are ranked in ascending order of popularity, 
with x being the normalized rank. Specifically, G = 
corresponds to uniform popularity among items, while 
G = 1 corresponds to maximal dispersion. 

To see how Gini coefficient quantifies the changes in 
popularity dispersion we study as examples the scien- 
tific citation data and the baby name data. The sci- 
entific citation data is based on the citation relation in 
the APS (American Physics Society) journals from 1893 
to 2009 [15|, and the baby name data is based on the 
first names taken from US Social Security Administra- 
tion, and contain the top 1000 boy and girl names every 
year from 1880 to 2009 [16j . What we are interested most 
is how the dispersion changes with time in these two sys- 
tems. The results are reported in Fig. 1 from which 
we can see the Gini coefficient keeps increasing in APS 
citation system while decreasing in baby name system. 
Due to the technological advances, ones get access to far 
more information than before. Good papers can thus 
have wider spread and are cited more which leads to a 
larger dispersion and hence lower global diversity. Simi- 
larly, parents know more candidate names for babies, and 
the system shows increasing diversification. 

The above examples show that the changes in global 
diversity are well captured by the Gini coefficient. We 
thus make use of Gini coefficient to examine the influence 




FIG. 2: (Color online) An illustrative example of the evolution 
of the bipartite network. The red node corresponds to the 
active user, and the red link corresponds to the choice made 
by the user according to recommendation results. 

of recommender systems on popularity dispersion. 

III. THE REINFORCING INFLUENCE OF 
RECOMMENDER SYSTEMS 

We investigate in this section the influence of recom- 
mender systems on the global diversity by examining dis- 
persion in item popularity. Here, we consider four rec- 
ommendation algorithms including mass diffusion (MD), 
heat conduction (HC), user-based collaborative filtering 
(UCF), item-based collaborative filtering (ICF). In ad- 
dition, we consider two benchmark algorithms including 
popularity-based recommendation (PR) and random rec- 
ommendation (RR), corresponding to the recommenda- 
tions of respectively most popular and random items. 

We first give brief descriptions of the MD algorithm. 
Consider a system of N users and M items represented by 
a bipartite network with adjacency matrix A, where the 
element ai a = 1 if user i has collected object a, and a,i a = 
otherwise (throughout this paper we use Greek and 
Latin letters, respectively, for object- and user-related 
indices) . 

For a target user i, the algorithm starts by assigning 
one unit of resources to objects collected by i, and re- 
distributes the resource through the user-item network. 
We denote the vector f as the initial resources on items 
where f a is the resource possessed by object a. The re- 
distribution is represented by f = Wf, where 

M 

Wa ^T^— (2) 

is the diffusion matrix, with kp = Y^Li a ip an d h = 
a h denoting the degree of object /3 and user I respec- 
tively. Technically, recommendations for a given user i 
are obtained by setting the initial resource vector f* in ac- 
cordance with the objects the user has already collected, 
that is, by setting f l a = ai a . The resulting recommenda- 
tion list of uncollected objects is then sorted according to 
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P a in descending order. Physically, the diffusion is equiv- 
alent to a three-step random walk starting with fej units 
of resources on the target user i. The recommendation 
score of an item is taken to be the resources on the item 
after the diffusion. The scores for objects that user i have 
already collected are set to 0. The recommendation list 
for user i is generated by ranking all his/her uncollected 
objects in descending order of their final resources. 

The HC algorithm works similar to the MD algorithm, 
but instead of a diffusion process, the scores are evaluated 
by a conduction process as represented by 

i M 

1 V"* a la a lB /0 x 
a 1=1 1 

Physically, the temperature of an object is considered to 
be the average temperature of its nearest neighborhood, 
i.e. its connected objects. The higher the temperature 
of an item is, the higher its recommendation score. 

The CF algorithms provide recommendations based on 
user or item similarities. It is divided into two main 
categories: the user-based CF and the item-based CF. In 
UCF, the recommendation score of an item is evaluated 
based on the similarity between the target user and the 
users who collected the item. The final recommendation 
score for each item can be written as 

N 

where is the similarity between user i and j. 

In ICF, the recommendation score of an item is evalu- 
ated based on its similarity with the collected items of the 
target user. Similarly, the final recommendation score for 
each item can be written as 

M 

f l a = ^2s a0 a i0 . (5) 

/3=1 

where s a p is the similarity between item a and (3. 

The measure of similarities used in CF is subject to 
definition. Here we define the measure of similarity as 
the number of common neighbors [17| in the bipartite 
networks. 

With the above mentioned algorithms, we consider a 
scenario of recommender systems as follows. At every 
step a random user is selected as the active user, based on 
whom the recommendation scores of all items are eval- 
uated. For simplicity, we assume that the active user 
would accept the recommendation results and select the 
uncollected item with the highest recommendation score, 
i.e. adding a link between the active user and the item in 
the bipartite network. An illustrative example is shown 
in Fig. 2. The red node corresponds to the active user, 
and the red link corresponds to the choice made by the 
user according to recommendation results. 

In one marco-step of our simulation, we randomly 
choose 10 percent of users as active users. After each 



macro-step, we evaluate the dispersion the item popu- 
larity by Gini coefficient. Note that we do not consider 
the growth of the system since introducing new users or 
items may involve the cold start problem for them [l8j . 
The datasets we will examine are the subsets of data ob- 
tained from four online systems: Movielens, Netflix, de- 
licious and Amazon. These data are random samplings 
of the whole records of user activities in these websites, 
the descriptions of data are given in Table I. 



TABLE I: Description of the data 



network 


Users 


Items 


Links 


Sparsity 




Movielens 


943 


1,682 


82, 520 


5.20- 10" 


-2 


Netflix 


3,000 


3,000 


197, 248 


2.19 • 10~ 


-2 


Delicious 


1,000 


18, 700 


63, 290 


3.40 • 10" 


-3 


Amazon 


5,000 


12, 377 


36, 391 


5.88 • 10" 


-4 



We show in Fig. 3 the evolution of Gini coefficient in 
simulations as a function of macro-step. As we can see, 
the Gini coefficient increases in the presence of MD, UCF 
and ICF algorithms. This corresponds to their reinforc- 
ing influences on the system, leading to a wider dispersion 
of item popularity after successive recommendations. A 
further evidence can be seen in Fig. 4, which shows that 
popular items become more popular, while neglecting the 
rest of the items. This corresponds to an undesired influ- 
ence, as choices and visions for users become more limited 
in the presence of these recommendation algorithms. 

We can further understand the reinforcing influences 
of the MD, UCF and ICF recommender systems by com- 
paring their Gini coefficients with the unpersonalized 
popularity-based algorithm. As shown in Fig. 3, similar 
trends are observed between the four algorithms. We can 
examine the underlying reasons in Fig. 4, which shows 
that the MD, UCF, ICF and PR algorithms only rec- 
ommend to users the most popular items. These results 
imply that the changes in the distribution of item pop- 
ularity are similar in these four algorithms. Therefore, 
personalized elements in the MD, UCF and ICF algo- 
rithms do not increases the global diversity as compared 
to the unpersonalized PR algorithm. 

On the other hand, the HC algorithm behaves quite 
differently from the other algorithms. As we can see 
in Fig. 3, it generally decreases the Gini coefficient in 
Movielens and Netflix, where density of links is high. 
In sparse systems, the three-step conduction process can 
only reach some items with large degree, and inevitably 
add links to hot items. This leads to an increasing Gini 
coefficient. Moreover, the HC method is also different 
from the random recommendations as we can see from 
Fig. 4. Instead of uniform addition of links, it inclined 
to add links to items with small degree. It implies that 
the HC algorithm does not reinforce the popularity of 
hot items as the MD and CF algorithms. 
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FIG. 3: (Color online) The change of the Gini coefficient for 
items' popularity when using different recommendation meth- 
ods in real systems. The results are averaged on 100 indepen- 
dent realizations. 
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FIG. 4: (Color online) The items popularity increment when 
using different recommendation method in real systems. The 
results are averaged on 100 independent realizations. 



IV. THE MEAN-FIELD APPROXIMATION 

To better understand their influences of recommender 
systems, we derive analytically the distribution of item 
scores after the recommendation processes. The major 
difficulty in analysis comes from the particular network 
topology of each dataset, which embeds the non-trivial 
correlations between users and items [19|, l2fj • Here we 
focus on the recommendation influences, and assume a 
simple topology where users and items are randomly con- 
nected [2l| . This corresponds to a crude mean- field ap- 
proximation, but such assumption facilitates the analysis 
and the illustration of physical behaviors underlying the 



recommendation algorithms. 

To begin our analysis, we derive the probability pi a 
that a user i and an item a are connected in a random 
graph. Suppose we start with ki cavities on user i and 
k a on item a, which are respectively the degree of i and 
a. If one cavity is picked randomly among the items, the 
probability that a being picked is . It implies that 



P 



1 



(1 



-) ki , where (1 



-) fci is the 



J2f3 = l k f> ' E/3=l k f>' 

probability that i is not connected to a. As 53,8=1 hp 
k a , expansion to the first order of k a leads to Pi a ~ 



1 - (1 - ki 



ESU hp 



_ kjkg 



where c = hp is the 



total number of links in the bipartite network. 

We then derive the mean-field expression of recommen- 
dation scores in the MD recommender system. As men- 
tioned above, the MD method is based on the three-step 
diffusion. The resource vector for items in the first step 
and last step are denoted respectively by f and f. In the 
second step, the resources are in users' side and the cor- 
responding vector is denoted as e. By considering the 
last step of the diffusion process, the score of a from user 



i is given by /* = (1 - p, 



)T N 



e.-Pjc 



Substitution of 



Pja 



J a 



leads to 




kf f\i \ /C'j k, 



(6) 



Next we derive the scores for the HC algorithms by 
again considering the last step of the conduction process. 
However, the total "resources" does not conserve in heat 
conduction but instead the temperature of user j is given 
b y e j = m Y^=i Itf = m i where || corresponds to the 
random choices of initial collected item for i. Therefore, 



fa = 1 



ki k a 



( k^ 7 



JV 

\Mk a 

3=1 



= 1- 



h h 



ki 

M 
(7) 

In the user-based CF, scores of items are evaluated 
by the similarity between the target user and the users 
who have collected it. The user similarity is given by 
the number of common neighbors. Therefore, p a — 

(1 — Pia)J2j s ijPja where Sij pa -p 2 - in the mean-field 
approximation. The score for object a is then approxi- 
mated by 



J a. 



kih 



N 

E 



k^ kj kj k^ 



= 1- 



k% hot \ k-i k^x b 
cM ' 

(8) 

X^=i is a constant for a given network, 
to user-based CF, the item similarity in 
item-based CF can be approximated by s a p — ka ^ 13 in 
the mean-field approximation. The score for object a is 



where b 
As similar 
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FIG. 5: (Color online) The simulation result and the theoret- 
ical result of the total recommendation score versus the orig- 
inal item degree in different recommendation engines. The 
simulation results are averaged on 100 independent realiza- 
tions. 



FIG. 6: (Color online) The change of the Gini coefficient for 
items' popularity when adjusting the A in the hybrid recom- 
mendation algorithm in real systems. The results are aver- 
aged on 100 independent realizations. 



then approximated by 



f 



where d — Ylp=i ^ s a constant for a given network. 

In order to compare the simulated results and the 
mean-field predictions, we evaluate the corresponding to- 
tal scores F a = fa that a item receives from all the 
users. As shown in Fig. 5, the mean- field approxima- 
tion captures both the magnitude and the trend of the 
recommendation scores. 

Further insights are drawn by noting c> kik a in most 
systems, which implies /* cx: kik a in Eqs. (4), (5) and 
(7). Since we assume that users always accept the item 
with highest recommendation scores, the recommenda- 
tion scores in the MD, UCF and ICF cases are thus sim- 
ilar to the PR algorithm which recommends the most 
popular items. This again shows the reinforcing influ- 
ence of these recommendation algorithms. On the other 
hand, Eq. (5) suggests /* oc fc, in the HC algorithm, 
which is item independent as in the case of random rec- 
ommendations. 

Though the approximated scores of HC agree well with 
RR, their behaviors are difference in terms of choices of 
items. According to f % a = (1 - p ia ) £\ =1 TTfef 



M 

E 

(3=1 



~c N~ 



k<i kf\ \ kf kfy d 



cN 
(9) 



users 



select the reachable items with lowest degree after three- 
step conduction, compared to the random choice. There- 
fore, the HC and RR algorithms show different influence 
on the dispersion of item popularity, as we can see in Fig. 
3(c) and (d). 



V. STEADY GINI COEFFICIENT BY HYBRID 
RECOMMENDATIONS 

As we have seen from the previous sections, the MD al- 
gorithm reinforces the popularity of hot items and limits 
available choices, while the HC algorithm recommends 
items with low popularity and increases global diversity. 
It is thus interesting to examine the influence on diver- 
sity if these two algorithms with opposite influences are 
combined. We thus adopt the hybrid algorithm of MD 
and HC proposed in with the new recommendation 
score h a given by 



(10) 



The parameter A adjust the relative weight between the 
two algorithms. When A increases from to 1, the hybrid 
algorithm change gradually from HC to MD. We remark 
that though Eq. (|10p corresponds to a linear combination 
of scores, the hybrid algorithm is a non-linear combina- 
tion of HC and MD as users select only items with highest 
scores. 

The influence of the hybrid algorithm on Gini coeffi- 
cient is shown in Fig. 6 as a function of A. The lines with 
different symbols correspond to Gini coefficient measured 
after increasing macro-step. As we can see from Fig. 6 
(a) and (b), the Gini coefficient increases with A, cor- 
responds to a transition from HC to MD recommender 
systems. It is interesting to note that Gini coefficient 
shows a significant increase in a short range of A on the 
Netflix and Movielens datasets, and becomes saturated 
afterwards. The saturated Gini coefficient corresponds to 
dominance of the MD algorithm such that only popular 
items are recommended, despite the presence of HC al- 
gorithm. Similar behaviors are not observed in Fig. 6 (c) 
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and (d) in the Delicious and Amazon datasets, which are 
sparse compared to the Netflix and Movielens datasets. 

Another interesting behavior is noted in Fig. 6 (a) and 
(b) when we compare the Gini coefficient after different 
macro-steps of recommendations. As we can see, the lines 
with different symbols intersect at a particular value of 
A, suggesting a steady Gini coefficient after the reinforce- 
ment of recommendations. The corresponding value of A 
thus corresponds to the balance between the HC and MD 
algorithms, leading to steady dispersion in item popular- 
ity. This is desirable when one considers the reinforcing 
influence on global diversity as undesired side-effect of 
recommcndcr systems. These values of A and the cor- 
responding Gini coefficient are compared respectively to 
the values of A with optimal recommendation accuracy 
Q and the Gini coefficient before recommendation algo- 
rithms are implemented. These results show that high 
recommendation accuracy does not always guarantee a 
global diversity, leading to a paradox in recommenda- 
tions. 



VI. CONCLUSION 

Recommendation is an effective way to solve the prob- 
lem of excess information. However, it is unclear how 



they allocate popularity among items. In this paper, we 
simulate successive recommendations and measure their 
influence on the dispersion of item popularity by Gini co- 
efficient. Our result indicates that local diffusion and col- 
laborative filtering reinforce the popularity of hot items, 
widening the popularity dispersion. On the other hand, 
the heat conduction algorithm increases the popularity 
of the niche items and generates smaller dispersion of 
item popularity. Simulations are compared to mean-field 
approximation. Our results indicate that there is rein- 
forcing influence of recommender systems on global di- 
versification. This work provides a deeper understanding 
of these recommendation methods, highlights the impor- 
tance of the global diversity and may shine some light 
for developing a new recommendation method that can 
directly controls the global diversity. 
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